Introduce LayerNorm optimization from latest Apex #277

Quentin-Anthony · 2023-11-01T19:21:24Z

My PR lets the user disable this LayerNorm optimization, but I suspect everyone will use it so it's on-by-default.

Not backwards-compatible with older Apex. Do you need a version check or is this ok?

tjruwase · 2023-11-01T20:19:43Z

Not backwards-compatible with older Apex. Do you need a version check or is this ok?

@Quentin-Anthony, thanks for this PR. But we do need backwards-compatibility, so please add a version check.

Quentin-Anthony · 2023-11-02T00:00:26Z

Not backwards-compatible with older Apex. Do you need a version check or is this ok?

@Quentin-Anthony, thanks for this PR. But we do need backwards-compatibility, so please add a version check.

Apex doesn't have versioning yet, so I added support to manually inspect the function and see if the memory_efficient arg exists in FusedLayerNormAffineFunction.forward, which is a bit messy but does the job.

Hopefully in the future NVIDIA/apex#1648 gets merged and we can just check apex.__version__

tjruwase · 2023-11-02T01:08:54Z

Apex doesn't have versioning yet, so I added support to manually inspect the function and see if the memory_efficient arg exists in FusedLayerNormAffineFunction.forward, which is a bit messy but does the job.

Works for me. Thanks!

RuiWang1998 · 2023-11-24T03:58:39Z

Hi,

Author of NVIDIA/apex#1715 here. Thanks for incorporate this into the repo (as the default)! This is very exciting.

Moreoever, I'm writing to let you guys know that https://github.com/Quentin-Anthony/Megatron-DeepSpeed-MS/blob/046319fecccfb8053ad3de5181e48f943ff14d27/megatron/model/fused_layer_norm.py#L96C18-L96C75 also has the same memory_efficient feature in the same pr!

tjruwase · 2023-11-26T21:58:55Z

@RuiWang1998, thanks for the information. @Quentin-Anthony, do you have bandwidth to handle this?

Quentin-Anthony · 2023-11-26T23:56:22Z

@RuiWang1998, thanks for the information. @Quentin-Anthony, do you have bandwidth to handle this?

Yep I'll take care of it

Quentin-Anthony added 2 commits November 1, 2023 12:13

Introduce LayerNorm optimization from NVIDIA/apex#1715

34c9b34

Fix args call

7c59960

Quentin-Anthony requested review from jeffra, tjruwase, ShadenSmith, conglongli, awan-10, eltonzheng, minjiaz, RezaYazdaniAminabadi, duli2012, mrwyattii, yaozhewei, arashb, xiaoxiawu-microsoft and GuanhuaWang as code owners November 1, 2023 19:21

Quentin-Anthony added 2 commits November 1, 2023 14:37

Ad-hoc apex version check

8434496

Remove unnecessary TransformerConfig arg

046319f

tjruwase approved these changes Nov 2, 2023

View reviewed changes

tjruwase merged commit ef13d09 into microsoft:main Nov 2, 2023
1 check passed

RuiWang1998 mentioned this pull request Dec 21, 2023

Make fused normalization functions backward-compatible NVIDIA/apex#1760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce LayerNorm optimization from latest Apex #277

Introduce LayerNorm optimization from latest Apex #277

Quentin-Anthony commented Nov 1, 2023

tjruwase commented Nov 1, 2023

Quentin-Anthony commented Nov 2, 2023

tjruwase commented Nov 2, 2023

RuiWang1998 commented Nov 24, 2023

tjruwase commented Nov 26, 2023

Quentin-Anthony commented Nov 26, 2023

Introduce LayerNorm optimization from latest Apex #277

Introduce LayerNorm optimization from latest Apex #277

Conversation

Quentin-Anthony commented Nov 1, 2023

tjruwase commented Nov 1, 2023

Quentin-Anthony commented Nov 2, 2023

tjruwase commented Nov 2, 2023

RuiWang1998 commented Nov 24, 2023

tjruwase commented Nov 26, 2023

Quentin-Anthony commented Nov 26, 2023